Zero-Crossing-Based Channel Attentive Weighting of Cepstral Features for Robust Speech Recognition: The ETRI 2011 CHiME Challenge System
نویسندگان
چکیده
We present a practical and noise-robust speech recognition system which estimates a target-to-interferers power ratio using a zero-crossing-based binaural model and applies the power ratio to a channel attentive missing feature decoder in the cepstral domain. In a natural multisource environment, our binaural model extracts spatial cues at each zero-crossing of a filterbank output signal to localize multiple sound sources and estimates a ratio mask reliably which segregates target speech from interfering noises. Our system uses gammatone filterbank cepstral coefficients (GFCCs) for the recognition and the channel attentive decoder utilizes the ratio mask on weighting the cepstral features when calculating the output probability in the Viterbi decoding. On the experiments of CHiME final testset, our channel attentive GFCC system improves the baseline recognition result 12.2% on average, and with noisy training condition, the average improvement amounts to 18.8%.
منابع مشابه
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend
This paper gives an in-depth presentation of the multi-microphone speech recognition system we submitted to the 3rd CHiME speech separation and recognition challenge (CHiME-3) and its extension. The proposed system takes advantage of recurrent neural networks (RNNs) throughout the model from the front-end speech enhancement to the language modeling. Three different types of beamforming are used...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملA Two-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments
An acoustic front-end for robust automatic speech recognition in noisy and reverberant environments is proposed in this contribution. It comprises a blind source separation-based signal extraction scheme and only requires two microphone signals. The proposed front-end and its integration into the recognition system is analyzed and evaluated in noisy living room-like environments according to th...
متن کاملRobust Automatic Speech Recognition for the 4th CHiME Challenge Using Copula-based Feature Enhancement
In this paper, we investigate the application of the copula model for enhancing features in automatic speech recognition task. We compute a set of utterance-specific nonlinear transformations based on the copula model and use these transformations to obtain the enhanced features for every utterance in the dataset. These features improve the performance of the baseline system by about 4.3%, 1.4%...
متن کاملNoise Robust Speech Recognition Using Multi-Channel Based Channel Selection And ChannelWeighting
In this paper, we study several microphone channel selection and weighting methods for robust automatic speech recognition (ASR) in noisy conditions. For channel selection, we investigate two methods based on the maximum likelihood (ML) criterion and minimum autoencoder reconstruction criterion, respectively. For channel weighting, we produce enhanced log Mel filterbank coefficients as a weight...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011